RFF: A Robust, FF-Based MDP Planning Algorithm for Generating Policies with Low Probability of Failure
نویسندگان
چکیده
Over the years, researchers have developed many efficient techniques, such as the planners FF (Hoffmann and Nebel 2001), LPG (Gerevini, Saetti, and Serina 2003), SATPLAN (Kautz, Selman, and Hoffmann 2006), SGPLAN (Hsu et al. 2006), and others, for planning in classical (i.e., deterministic) domains. Some of these planning techniques have been adapted for planning under uncertainty and provide some impressive performance results. For example, the FF-REPLAN (Yoon, Fern, and Givan 2007) is a reactive online planning algorithm that has been demonstrated to be very effective for manyMDP planning problems. As another example, planning-graph techniques inspired by FF has also been generalized to planning under nondeterminism and partial observability (Hoffmann and Brafman 2005; Bryce, Kambhampati, and Smith 2006). Finally, the conformantplanning approach of (Palacios and Geffner 2006; 2007) describes how to translate any conformant planning problem into a classical problem for a classical planner. Their approach generates a single translation, and can only find conformant solutions. In this paper, we describe our work that investigates a somewhat middle-ground between the previous approaches described above. In particular, we present an MDP planning algorithm, called RFF, for generating offline robust policies in probabilistic domains. Like both of the above approaches, RFF first creates a relaxation of an MDP planning problem by translating it into a deterministic planning problem in which each action corresponds to an effect of a probabilistic action in the original MDP and for every such effect in the original MDP there is a deterministic action, and there are no probabilities, costs, and rewards. In this relaxed planning problem, RFF computes a policy by generating successive execution paths leading to the goal from the initial states by using FF. The policy returned by RFF has a low probability of failing. In our approach, we interpret this not as the probability of reaching a goal but as the probability of causing any replanning during execution. In this work, we use a Monte-Carlo simulation in order to compute the probability that a partial policy would fail during execution (i.e., the probability that the execution of
منابع مشابه
Incremental plan aggregation for generating policies in MDPs
Despite the recent advances in planning with MDPs, the problem of generating good policies is still hard. This paper describes a way to generate policies in MDPs by (1) determinizing the given MDP model into a classical planning problem; (2) building partial policies off-line by producing solution plans to the classical planning problem and incrementally aggregating them into a policy, and (3) ...
متن کاملExtending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends
Recent domain-determinization techniques have been very successful in many probabilistic planning problems. We claim that traditional heuristic MDP algorithms have been unsuccessful due mostly to the lack of efficient heuristics in structured domains. Previous attempts like mGPT used classical planning heuristics to an all-outcome determinization of MDPs without discount factor ; yet, discounte...
متن کاملPOND-Hindsight: Applying Hindsight Optimization to POMDPs
We present the POND-Hindsight entry in the POMDP track of the 2011 IPPC. Similar to successful past entrants (such as FF-Replan and FF-Hindsight) in the MDP tracks of the IPPC, we sample action observations (similar to how FFReplan samples action outcomes) and guide the construction of policy trajectories with a conformant (as opposed to classical) planning heuristic. We employ a number of tech...
متن کاملAn Enhanced HL-RF Method for the Computation of Structural Failure Probability Based On Relaxed Approach
The computation of structural failure probability is vital importance in the reliability analysis and may be carried out on the basis of the first-order reliability method using various mathematical iterative approaches such as Hasofer-Lind and Rackwitz-Fiessler (HL-RF). This method may not converge in complicated problems and nonlinear limit state functions, which usually shows itself in the f...
متن کاملResilience-Based Framework for Distributed Generation Planning in Distribution Networks
Events with low probability and high impact, which annually cause high damages, seriously threaten the health of the distribution networks. Hence, more attention to the issue of enhancing network resilience and continuity of power supply, feels more than ever, all over the world. In modern distribution networks, because of the increasing presence of distributed generation resources, an alternat...
متن کامل